瑞典专利SE535206C2 Detecting scene clips in a video sequence

专利PDF首页>>瑞典专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
A method of detecting scene cuts in a video sequence comprised of a plurality of frames, the method comprising: computing motion vectors for a first frame and a second frame in the plurality of frames; determining a motion cost value for the computed motion vectors of the first frame and the second frame; determining a ratio between the motion cost value for the computed motion vectors of the first frame and the motion cost value for the computed motion vectors of the second frame; and determining if there is a scene cut between the first frame and the second frame based on the ratio.
公开号:SE535206C2
申请号:SE0700732
申请日:2004-05-13
公开日:2012-05-22
发明作者:Barin Geoffry Haskell；Adriana Dumitras；Atul Puri
申请人:Apple Inc；
IPC主号:

专利说明:

The number of bidirectional motion compensated (B) frames to be encoded between intra (I) frames or unidirectional (P) frames is a coding decision that significantly affects the subsequent compressed video bit stream. A video encoder must determine which method, sometimes any conceivable method (or mode), is best for encoding each pixel block and how many B frames, if any, are to be encoded between each I or P frame. Thus, effective methods are needed to determine the number of B-frames to be encoded between I-frames or P-frames in a video sequence.
SUMMARY OF THE INVENTION The present invention provides methods for encoding frames in a video sequence where the sequence is processed in two rounds. During the first round, motion vectors for the pixel blocks in each frame are calculated in a set of consecutive frames with reference to another specific frame or frames. In some embodiments, motion compensated errors (MCE) are also calculated for pixel blocks in each frame. A motion cost value is then determined for each frame, where a motion cost value relates to the number of bits needed to encode motion vectors and / or the value of the pixel blocks' MCE of the frame. A derived cost value is then calculated based on the operating cost value of at least one frame (for example, a derived cost value may be the operating cost value of a frame, the average of the operating cost values of two or bild your frames or the ratio of the operating value of one frame to the operating cost of one frame). second frame).
In addition, in the first round, a derived cost value is used to estimate the number of B frames (N B) to be encoded in the set of consecutive frames.
The number of B-frames (NB) to be coded increases as long as the derived cosmad value is lower than a predetermined threshold value. In the second round, frame NB + I is coded in the set of successive frames as a P frame and frames I to NB are coded as B frames where some or all motion vectors, calculated in the first round, are reused in the coding process for the second the round.
In some embodiments, during the first round, motion vectors are calculated for each pixel block and for each frame in a set of consecutive frames with reference to an immediately consecutive frame. In these embodiments, some of the motion vectors calculated in the first round calculated in the first round are reused in the coding process for the second round. In a further embodiment, during the first round, motion vectors are calculated for each frame in a set of successive frames with reference to the same previous frame (frame 0 in the set of frames). In these embodiments, all of the motion vectors calculated in the first round are reused in the second round coding process.
In some embodiments, the derived cost value is the average of the motion costs for a series of consecutive frames. In the second embodiment, the derived cost value is the same as the operating cost of a single frame. In further embodiments, the derived cost value is the ratio between the operating cost of a first frame and the operating cost of a second frame that immediately follows the first frame. In these further embodiments, the motion cost ratio is used to detect an impulse-like increase in motion cost between two consecutive frames, which typically indicates a scene cut between two consecutive frames.
As such, these further embodiments provide a scene editing method used in conjunction with the two-step coding methods of the present invention. In further embodiments, the scene clip indication method is used independently of the two-step coding method.
BRIEF DESCRIPTION OF THE DRAWINGS The new features of the invention are set forth in the appended claims.
However, for an explanatory purpose, a number of embodiments are presented in the following figures.
Fig. 1 shows a coding system with coding and decoding components.
F ig. 2 is a Greek illustration of frames of a video sequence in viewing order.
Fig. 3 shows the graphic frames in the video sequence from Fig. 2 in transmission order.
Fig. 4 is a Greek illustration of a set of successive frames processed in two rounds with partial reuse of motion vectors calculated in the first round. in Fig. 5 is a fate diagram of a method for encoding a video sequence in two rounds where certain motion vectors calculated in the first round are reused in the second round.
Fig. 6 is a flow chart of the partial two-round reuse method given in Fig. 5 combined with a stage cutting method in accordance with the present invention. Fig. 7 shows a diagram of the operating cost per frame of a series of frames in a video sequence.
Fig. 8 is a fate diagram of a method for identifying your scene clips in a video sequence.
Fig. 9 is a graphical illustration of a set of successive frames processed in two batches with complete reuse of motion vectors calculated in the first batch.
Fig. 10 is a fate diagram of a method for encoding a video sequence in two rounds where all motion vectors calculated in the first round are reused in the second round.
Fig. 11 is a flow chart of the complete two-round reuse method given in Fig. 10 combined with a stage cut detection method in accordance with the present invention.
Fig. 12 shows a computer system in which some of the embodiments of the invention are implemented.
DETAILED DESCRIPTION OF THE INVENTION In the following description, numerous details are set forth in the Explanatory Notes. However, a person skilled in the art will appreciate that the invention may be practiced without these specific details. In other examples, well-known structures and devices are shown in block diagram format in order not to obscure the description of the invention with unnecessary details.
The present invention provides methods for encoding frames of a video sequence in which the frames are processed in two batches. In the first round, motion vectors for pixel blocks of each frame are calculated in a set of successive frames, with reference to another specific frame or frames. In some embodiments, motion compensated errors (MCEs) are also calculated for pixel blocks in each frame. An operating cost value for each frame is then determined, the operating cost value is related to the number of bits needed to encode motion vectors and / or the value of the MCE of the pixel blocks of the frame. A derived cost value is then calculated based on the operating cost value of at least one frame (for example, the derived cost value may be the operating cost value of a frame, the average of the operating cost values of two or two frames or the ratio of the for a second frame).
In addition, the derived cost value is used in the first round, to determine the number of B-frames (NB) to be coded in the set of successive frames.
The number of B-frames (NB) to be coded increases as long as the derived cost value is lower than a predetermined threshold value. In the second round, frame NB + 1 is coded, in the set of successive frames, as a P-frame and frames 1 to NB are coded as B-frames, where some or all motion vectors calculated in the first round are reused in the second round coding process.
In some embodiments, in the first round, motion vectors for each frame are plotted in a set of consecutive frames with respect to an immediately preceding frame. In these embodiments, some of the motion vectors calculated in the first round are reused in the coding process of the second round. In further embodiments, in the first round, motion vectors are calculated for each frame in a set of consecutive frames with respect to the same preceding frame (frame 0 in the set of successive frames). In these embodiments, all, in the first round, calculated motion vectors are reused in the second round coding process.
In some embodiments, the derived cost value is the average of the operating costs in a series of consecutive frames. In other embodiments, the derived cost value is the operating cost of a single frame. In further embodiments, the derived cost value is a relationship between the operating cost of a first frame and the operating cost of a frame immediately preceding the first frame. In these further embodiments, the motion cost ratio is used to detect an impulse-like increase in motion cost between two consecutive frames which typically indicate a scene cut between the two consecutive frames. As such, these further embodiments provide a scene clip detection method used in conjunction with the two-round coding method of the present invention. In other embodiments, the scene clip detection method may be used independently of the two-round coding method. 10 15 20 25 30 535 206 Some embodiments described below relate to video frames in YUV format. A person with ordinary knowledge in the field, however, will realize that these designs can also relate to a variety of formats other than YUV. In addition, other video formats (such as RGB) can be easily transformed into the YUV format. Furthermore, embodiments of the present invention may relate to various video coding applications (for example, DVD, digital storage media, television broadcasting, internet streaming, communication, teleconferencing, etc.) in real time or in real time. Embodiments of the given invention can also be used for video sequences with other coding standards such as H.263 and H.264 (also known as MPEG-4 / Part 10).
In the vocabulary used here, a pixel block is a set of pixels in a frame, which is also known in the art as a block, macroblock or sub-block. Furthermore, according to the vocabulary used here, a first pixel block in a first frame is said to "refer" to a second pixel block in a second frame, when information from the second pixel block is used to calculate a motion vector for the first pixel block. Thus, a first frame is said to "refer" to a second frame when a pixel block in the first frame "refers" to a pixel block in the second frame. In addition, by calculating motion vectors or motion compensated errors (MCEs) of a frame, it is meant that motion vectors or MCEs are individually calculated for the pixel blocks of the frame. In addition, as given here, a set of consecutive frames is a subset of frames in a video sequence where the frames are consecutive.
The various embodiments described below provide a method for encoding a video sequence in two rounds. The first round calculates motion vectors for a set of consecutive frames in a video sequence and determines the number of B frames to be encoded in the set of consecutive frames. The second round encodes the determined number of B-frames by using some or all of the motion vectors determined in the first round (for example, by partially or ﬁ completely reusing the motion vectors determined in the first round). Embodiments related to partial reuse of motion vectors determined in the first round are described in Section I.
Designs related to the complete reuse of motion vectors determined in the first round are described in Section II. The embodiments are a scene clip detection method that can be used in conjunction with the partial or complete reuse method or used independently are given in sections I and II. Fig. 1 illustrates an encoding system 100 with encoding and decoding components 110 and 1 15. In some systems, the encoding system 100 includes a pre-processing component preceding the encoding component 1 and a post-processing component following decoding component 115. As shown in Fig. 1, an original video sequence is received by coding component 110, where the original video sequence consists of a number of video frames.
Each frame in the video sequence consists of a number of pixels that have pixel positions where each pixel position contains one or more pixel values (for example values for brightness (Y) and / or chrominance (U, V) (the color value for a certain point in the image)). Each frame is divided into subsets of pixels called pixel blocks.
The encoding component 110 encodes each frame (for example, as an I, P or B frame) in the video sequence, to produce an encoded (compressed) video sequence. The number of B frames to be coded between each I or B frames is determined by the coding component 110 and is a decision which significantly affects the bit rate of the later compressed video bit ﬂ. The encoding component 110 implements methods of the present invention to determine the number, if any, of the B frames to be encoded between each I or B frames.
The encoded video sequence is then transmitted and received by the decoding component 115, which processes the encoded video sequence, to produce a decoded video sequence for display. Since the B frames of the encoded video sequence use information from incoming frames, the transmission order of the frames of the encoded video sequence (as transmitted to the decoding component 115) is typically different from the display order of the decoded video sequence frames (as produced by decoding component 115). Thus, the transmission order of the encoded video sequence frames is configured so that when it is time for the decoding component 115 to decode a B frame, the decoding component 115 has already received and stored information from the I frames or P frames needed to decode the B frame. .
Fig. 2 is a graphical illustration of frames 210 belonging to a video sequence 200, in the order in which they are shown. As shown in Fig. 2, each frame 210 is identified by a frame type (I, P or B) and a numerical index (1, 2, 3, .. _) indicating the display order of the frames in video sequence 200. In the example shown in Figs. F ig. 2, the set of successive frames from I1 to P5 contains three B-frames (Bg, B; and B4) coded between the I-frames and the P-frames. Likewise, the set of consecutive frames from P5 to P10 contains four B frames (B15, B-1, Bg and Bg) encoded between the P frames.
Arrows 220 indicate motion vector calculations using information ﬁ- from one frame (at the start of arrow 220), to calculate motion vectors for another frame (at the end of arrow 220). For example, as shown in Fig. 2, information bild from frame I1 is used to calculate motion vectors for frames Bz, Bg, B4 and P5. As such, frames Bg, B3, B4 and P5 are said to refer to frame I1. In addition, information from frame P5 is used to calculate motion vectors for frames B2, Bg., And B4. As such, frames B2, B1, and B4 are said to refer to frame P5.
After decoding a decoding component, video sequence 200 is displayed in the display order shown in FIG. 2. However, since the B frames use information from future frames, the transmission order of the frames in the video sequence 200 will be different from the display order shown in Fig. 2. Fig. 3 graphically illustrates the frames in the video sequence 200 from Fig. 2 in transmission order. . The video sequence 200 is received in the transmission order of decoding component 115 or decoding.
As shown in Fig. 3, for example, when it is time to decode frame Bg, decoding component 115 will have already received and stored the information for frames I1 and P5 which are needed to decode frame Bg. Similarly, decoding component 115 will have already received and stored the information for frames I1 and P5 which are needed to decode frames B5 and B4. In this operation, I-frames and P-frames are often referred to as stored frames, as they are stored for use in the predictive coding of other frames. However, in some scenarios, the B frames can also be stored and used to predict subsequent frames. Likewise, if the plurality of stored frames are available, two older or two future frames can be used to predictively encode each B frame. The frames are then rearranged from the transmission order to a display order (as shown in Fig. 2) for appropriate display.
The coding of the P-frame typically uses motion-compensating prediction where a motion vector is calculated for each pixel block in the frame by referring to pixel blocks of a previous (previous) frame. A predicted pixel block is then created by using the motion vector and translating the pixels in the previous frame referenced by the P frame. The difference between the actual pixel block in the P-frame and the created predicted block (referred to as prediction errors or motion-compensated errors) is then coded for transmission. In addition, each motion vector can be transmitted via predictive coding. This means that a prediction is created by using nearby motion vectors that have already been transmitted, and then the difference between the current motion vectors and the prediction for transmission is coded.
The motion compensated error (MCE) of a pixel block measures the difference between a first pixel block in the first frame and a ﬂ-flattened, motion-compensated second pixel block in a second frame referenced by the first pixel block and where the magnitude of the ﬂ motion is given by the motion vector of the first pixel block. MCE can be the sum of the square difference between the pixel values (for example, the difference in the brightness values (Y)) in the first pixel block and the corresponding pixel values in the offset second pixel block. Or, MCE can be the sum of the absolute difference between the pixel values in the first pixel block and the corresponding pixel values in the offset second pixel block. In some embodiments, a block transform, such as the Hadamard transform, is applied to the MCE before summing. The MCE of a pixel block is related to the operating cost of the bits needed to encode the pixel block (because the larger the MCE, the more bits are needed to encode the MCE and, consequently, to encode the pixel block).
Each B-pixel block typically uses two motion vectors, one calculated from information from a previous frame and one calculated from information from a future frame. From these motion vectors, two predicted pixel blocks are created, from which an average value is usually calculated to form a final predicted pixel block. The calculation of the mean value can, for example, use equal 50/50 weighting or uneven weighting can be used.
The difference (ie the prediction or motion compensated error) between the current pixel block in the B frame and the final predicted block is then coded for transmission.
As with P-pixel blocks, the motion vector of each B-pixel block can also be transmitted via predictive coding. That is, a prediction is created using nearby motion vectors that have already been transmitted and then the difference between the actual motion vector and the prediction for transmission is coded. However, with B-pixel blocks, the possibility exists of interpolating the motion vectors from the adjacent pixel blocks in the stored car windows. The interpolated value can then be used as a prediction and the difference between the current motion vector and the prediction is encoded for transmission. Such interpolations are performed on both the encoder and the decoder.
In some cases the interpolated motion vector is good enough to be used without any correction and in that case no motion vector data needs to be transmitted. This is referred to as Direct Mode in H.263 and H.264. This works especially well when the camera is slowly panning over a desktop background. In fact, the interpolation can be good enough to be used as it is, which means that no distinctive information needs to be transmitted for the motion vectors of these B-pixel blocks. Even inside each frame, the pixel block can be encoded in many ways. For example, a pixel block can be divided into several smaller sub-blocks, with motion vectors calculated and transmitted for each sub-block. The shape of the sub-blocks can vary and does not have to be square. Inside a P-frame or B-frame, some pixel blocks can be coded without the use of motion compensation, ie. they are encoded as intra (I) pixel blocks. Inside a B-frame, some pixel blocks can be better coded by using one-way motion compensation, ie. they will be coded as forward predicted or backward predicted depending on whether a previous frame or a future frame is used in the prediction.
Prior to transmission, the motion compensated errors of a pixel block are typically transformed by an orthogonal transform, such as the discrete cosine transform or an approximation thereof.
The result of the transformer operation is a set of transform coefficients, the same number as the number of transformed pixels in the pixel block. At the decoding component, the received transform coefficients are inversely transformed, in order to regain the motion-compensated error values for further use in the decoding. However, not all transform coefficients need to be sent to the decoding component for acceptable video quality. Depending on the available bit rate of the transmission, a number of the transform coefficients may be deleted and not transmitted. In decodem, their values are replaced by zeros before the inverse transposition.
In the transmission, the transform coefficients are typically quantized and entropically coded. Quantization involves the representation of the transform coefficient values by a neat set of possible values, which reduces the accuracy of the transmission and often forces small values to zero and which further reduces the number of transmitted coefficients. In quantization, each transform coefficient is typically divided by a step size for the quantization Q and rounded to the nearest integer. The integers are then entropically encoded using variable word-length codes such as Huf ﬁ nan codings or Arithmetic codings.
Finally, the size and shape of the pixel block used for the motion compensation need not be the same as the size and shape of the pixel block used for the transformation. For example, 16 x 16, 16 x 8, 8 x 16 pixels or smaller sizes are usually used for motion compensation, while 8 X8 or 4 x 4 are usually used for transformers.
In fact, the size and shape of the pixel blocks, for motion compensation and transformation, can vary from pixel block to pixel block.
An encoding component must select the best encoding method from all possible encoding methods (or modes) to be used to encode each pixel block.
The combination of the removal of transform coefficients, quantization of the transform coefficients that are transmitted and mode selection results in a reduction of the bit rate R used in transmission. It also leads to a distortion D in the decoded video. The coding component must therefore determine how many B frames, if any, are to be coded between each I or P frame. A "brute-force" approach would simply encode each combination of B-frames and pick the combination that minimizes the bit rate. However, this method is too complex. It also requires a very large number of “trial and error” operations and statistical collections, most of which must be discarded or a final decision made.
SECTION I: PARTIAL RECYCLING OF MOVEMENT VECTORS DETERMINED IN A FIRST ROUND In some embodiments, a method of encoding frames in a video sequence involves processing the frames in two rounds, with certain motion vectors resurfacing in the second second round. During the first round, motion vectors for the pixel blocks of the frame are calculated as if the final coding were to be performed using all P frames, with each frame using information from the previous frame. In an alternative embodiment, motion compensated errors are also calculated for each pixel block during the first round. The system calculates the operating cost for each 10 15 20 25 30 535 206 12 pixel blocks in a frame. In many embodiments, the motion cost of the pixel block is defined as the total number of bits needed to encode the motion vector and / or the MCE value calculated for the pixel block. In some embodiments, the motion vectors of the pixel block are given by the following equation: the motion cosmade of a pixel block = (X x rnotion_vectors_bits) + MCE where l is a Lagrange multiplier. For 7h = 0, the motion cost of the pixel block is equal to the value of motion compensated errors (MCE). If the Lagrange multiplier Ä is larger, the motion cost of the pixel block is proportional to the number of bits needed to encode the motion vector.
A frame motion cost (FMCi) is the sum of the motion costs of the pixel blocks in frame i. For m consecutive frames in a video sequence, an average frame frame motion cost (AF MCm) is calculated by to add the operating cost of the individual frame for the m successive frames and divide the result by m, as expressed by the following equation: AFMCm = sum i = 1 to m [FMC ﬂ / m For a series of m video frames, numbered 1 to m.
In some embodiments, the AFMC values for a set of consecutive frames are used to determine a number of B frames to be encoded between each I or P frame in the set of consecutive frames (as described below in in connection with Fig. 5). In other embodiments, the FMC values of an individual frame are used to determine the number of (NB) B frames. In further embodiments, a ratio (FMCm) / (FMCm_i) is used between the operating cost of a first frame and the operating cost of a second frame preceding the first frame to determine the number of (NB) B-frames.
In the second round, a number of (NB) frames are coded as B-frames and the frame NB + 1 is coded as a P-frame, where (NB) has been determined above in the first round. During the second round, some of the motion vectors calculated in the first round are reused to encode B-frames and / or P-frames (ie, the second round uses in part motion vectors calculated in the first round).
F ig. 4 is a Greek illustration of a set of successive frames 400 which are processed in two rounds with partial reuse of motion vectors, calculated in the first round. The set of successive frames 400 consists of five frames 410 shown in order of display. As shown in Fig. 4, for the first and second rounds, each frame 410 is identified by a frame type (I, P and B) and an indexed number (1 to 4) indicating the display order of the frame in the set of consecutive frames 400.
Arrows 420 indicate calculations of the motion vectors, which use information from a frame (at the beginning of the arrow 420) to calculate motion vectors for another frame (at the end of the arrow 420). Arrow 420, displayed above frames 410, shows motion vector calculations occurring in the first round, and arrows 420, displayed below frames 410, relate to motion vector calculations occurring in the second round. For example, in the first round, the information from 10 is used to calculate motion vectors for frame P1. In the second round, information from frames P4 and 10 is used to calculate motion vectors for frame BZ.
As shown in Fig. 4, in the first round, motion vectors for frames 1 to 4 are calculated, using the information from a previous frame (ie, motion vectors for frames 1 to 4 are calculated as if the final coding were to be performed using all P frames). In the first round, the number of B-frames to be coded in the set of successive frames 400 is also determined (as described below in connection with Fig. 5). In the example shown in Fig. 4, it has been determined that three B frames should be encoded in the set of consecutive frames 400.
In the second round, the determined number of B-frames (such as frames B1, B; and B3) is coded with the next frame coded as a P-frame (frame P4). As shown in Fig. 4, new motion vector calculations must be performed to calculate new motion vectors in the second round, to encode the B-frames and the P-frames. For example, arrows 425 and 430 indicate new motion vector calculations performed in the second round. In order for a motion vector from the first round to be reused in the second round, a condition must be met: l) the information needed to calculate the motion vector must be the same in both rounds (ie the information must originate from the same round). frame in both rounds).
For example, as shown in Fig. 4, information from frame 10 is used to calculate motion vectors for frame P1 in the first round. In other words, frame 10 is referenced by frame P1 in the first round. In the second round, information from frame IQ is also needed to calculate motion vectors for frame Bl. In other words, frame Io is also referenced by frame BO in the second round. As such, motion vectors, calculated for frame P1 in the first round, can be reused to encode frame B; in the second round. The motion vector calculations that produce motion vectors for frame P1 in the first round and that are reused in the second round are indicated by the dashed arrow 435. Since a B frame typically uses two motion vectors, one calculated from a previous frame and one from a subsequent frame , also comes the coding of frame B; in the second round require a new motion vector calculated with reference to frame P4.
Fig. 5 is a fate diagram of a method 500 for encoding a video sequence in two rounds where some motion vectors calculated in the first round are reused in the second round. The method 500 may be performed, for example, by the coding component 110. The method 500 begins with the reception (at 505) of a video sequence. The video sequence comprises a plurality of video frames indexed by m, from 0 to N, where m is a positive integer. Each frame in the video sequence comprises a number of pixels, which have one or two pixel values (for example, brightness (Y) and / or chrominance values (U, V)). Each frame is divided into sets of pixels, called pixel blocks.
The method is then encoded (at 510) frame 0 from the plurality of Video frames as an I frame. The method then sets (at 515) a counter m to the value l. The method calculates (at 520) motion vectors for the pixel blocks of frame m, using information from a previous frame m-1, i.e. motion vectors for the pixel blocks in frame m are calculated as if frame m were coded as a P frame. A graphical illustration of step 520 is shown in Fig. 4. The method determines (at 525) a motion compensated error (MCE) for the pixel blocks in frame m. In an alternative embodiment, MCE is not calculated for the pixel blocks in frame m. 10 15 20 25 30 535 206 The method then determines (at 530) an average value of the frame motion cost (AFMCm) for frames 1 to m, the mean value of the frame motion cost is calculated by adding the motion cost of the individual frames for frames 1 to m and dividing the result by m. The operating cost of an individual frame is calculated by summing the motion cost of each pixel block in the frame, the motion cost of the pixel block is proportional to the MCE value of the pixel block and the total number of bits needed to encode the motion vector. In an alternative embodiment, the operating cost of the pixel block consists solely of the MCE value of the pixel block. In further embodiments, the motion cost of the pixel block consists solely of the total number of bits needed to encode the motion vector of the pixel block. In a further embodiment, the method (at 530) determines the operating cost (FMCm) of a frame for each individual frame m.
The method then determines (at 535) if the AFMCm value is less than a predetermined threshold value T. In some embodiments, the threshold value T is different for different types of video sequences, for example a different threshold value can be used for sports sequences, soap operas, news reports, old films, video conferencing, etc. In further embodiments, the method (at 535) determines if the AFMCm value is less than a predetermined threshold value Tm, where Tu, represents a set of predetermined threshold values that varies in value depending on the value of the counter m. for Tm monotonically decrease as the value of m increases and thereby make it more difficult to increase the number of B-frames to be coded, by increasing the value of the counter m. In another embodiment, the method (at 535) determines if the FMCm value of the frame is lower than a predetermined threshold value T.
If the method determines (at 535) that the AFMCm value is less than the predetermined threshold value T, the method (at 540) increases the counter by 1 and continues at step 520. If the method determines (at 535) that the AFMCm value is not less than the predetermined threshold value T, the method (at 545) sets a variable n equal to the maximum value of 1 or (ml) so that the minimum value of n is 1. Note that the value of n represents the largest number of frames for which the mean ßr frame motion cost (AFMCn) is still is lower than the predetermined threshold value T. The method also sets the number of (NB) B frames to be coded to n- 1. 10 15 20 25 30 535 206 16 The first round of operation for method 500 consists of steps 515 to 545 and determines the number of (NB) B frames to be coded in the set of consecutive frames from 1 to n. The number (NB) B-frames to be coded increase until the average frame cost of the frame (AFMCn) of the successive frames exceeds the predetermined threshold value T and is then set to less than the largest value of n, for which the AFMCn is still less than the predetermined threshold value T. As such, the number of (NB) B frames to be encoded in a set of consecutive frames depends on the average value of the frame cost of motion for the set of consecutive frames.
The second round of operation for method 500 consists of steps 550 to 555. At step 550, the method encodes frame n as a P frame. The method then encodes (at 555), frames 1 to NB as B frames, by reusing some motion vectors calculated in the first round (steps 515 to 545). In some embodiments, motion vectors for the pixel blocks in frame 1, calculated in the first round, are reused to calculate motion vectors for the pixel blocks in frame I in the second round, as graphically illustrated in Fig. 4.
The method 500 then determines (at 560) whether frame n is the last frame in the video sequence (ie, if frame n is frame N). If so, the method is terminated. If frame n is not the last frame in the video sequence, the method reindexes frame n to frame 0 in the video sequence, so that frame n + 1 is re-indexed to frame 1, frame n + 2 is re-indexed to frame 2, and so on. The method then proceeds to step 515.
In an alternative embodiment, the partial two-round reuse method is optimized by motion vectors of Fig. 5 to encode a video sequence with a relatively high number of scene clips (ie, a relatively high number of frames with discontinuous content). Such video sequences can be found, for example, in music videos, sports broadcasts, etc. In the alternative embodiment, a scene clip detection method is used, where an impulse-like variation in the motion cost is monitored for two consecutive frames (which typically indicates a scene change).
Fig. 6 graphically depicts a fate diagram for Fig. 5's partial two-time reuse method combined with a scene clip detection method in accordance with the invention given. The method in Fig. 6 is similar to the method in Fig. 5 and only the steps that differ are described in detail here. The method 600 may be performed, for example, by coding component 110.
Method 600 begins with the reception (at 605) of a video sequence. The video sequence comprises a number of video frames indexed from 0 to N, where each frame is divided into sets of pixels called pixel blocks. The method then encodes (at 610) frame 0, of the plurality of video sequences, as an I frame. The method then sets a counter m (at 615) to 1. The method calculates (at 620) motion vectors for frame mzs pixel blocks, using information from the previous frame. The method then determines (at 625) a motion compensated error (MCE) for frame mzs pixel block. In an alternative embodiment, the MCE of frame mzs pixel blocks is not calculated.
The method then determines (at 630) the frame motion cost (FMCm) for frame m, by summing the total number of bits needed to encode the motion vector and / or the MCE value for each pixel block in frame m. The method also determines (at 630) the operating cost (FMCM) of a frame for a frame immediately preceding frame m, by summing the number of bits needed to encode the motion vector and / or the MCE value of each pixel block in frame ml. The method then calculates (at 630) the value of FMCm / FMCU ..- 1. During the first iteration of the method 600 where frame m-1 is an I frame, however, the value of FMCm / FMCM is set; to 0. Since the operating cost of the frame for an I frame is 0, this prevents a division operation where the denominator is 0.
The method then determines (at 635) whether the value of FMCm / FMCM is less than a predetermined threshold value C. In some embodiments, the threshold value C is determined experimentally. In some embodiments, the threshold value C is different for different types of video sequences, i.e. different threshold values can be used for video sequences related to sports, soap operas, news reports, old films, video conferencing sequences, etc. In further embodiments, the method (at 635) determines the value for FMCm / FMCM; is less than a predetermined threshold value Cm, where Cm represents a set of predetermined threshold values that varies in value depending on the value of the counter m. The value of m indicates the current number of frames to be coded as B-frames since the last coding of a P- frame in the video sequence. For example, the value of Cm may decrease monotonically as the value of m increases, thereby complicating the coding of ﬂ your frames as B-frames by increasing the value of m. 10 15 20 25 30 535 206 18 the operating cost between two consecutive frames (frames m and m-1). A relatively high value for F MCm / FMCM indicates that the operating cost of frame m is significantly higher than the operating cost of m-1. Such an impulse-like increase in the cost of movement between two consecutive frames typically indicates a scene cut between the two consecutive frames, i.e. the contents of the two consecutive frames are not continuous. The predetermined threshold value C (against which the value of FMCm / FMCm., Is compared (at step 630)) can be determined experimentally so that a value of FMCm / FMCWL equal to or greater than the threshold value C probably indicates a scene clip between frame m and frame ml.
If the method determines (at 635) that the value of FMCm / FMCM is less than the predetermined threshold value C, the method (at 640) increases the counter by one and continues at step 620. If the method determines (at 635) that the value of FMCm / FMCM does not is less than the predetermined threshold value C, the method (at 645) sets a variable n to the greater of 1 or (ml) and also sets the number of (NB) B frames to be encoded to nI. The first round of operation for method 600 consists of steps 615 to 645 and determines the number of (NB) B frames to be encoded in the set of consecutive frames ﬁ from 1 to n.
Note that the value of n represents the largest number of frames, for which the value of FMCm / FMCM is still less than the predetermined threshold value C. Therefore, frame n, the frame immediately before a detected scene clip and frame n + 1 is the frame immediately after the detected scene clip (as discussed below in relation to Fig. 7).
The second round of operation for the method 600 consists of steps 650 to 655. At step 650, the method encodes frame n (the frame immediately before the detected scene clip) as a P frame. As such, the method of Fig. 6 ensures that the frame is coded as a P frame immediately before a scene cut. This ensures that the frame before the scene clip does not reference a frame placed after the scene clip (if this were to happen, the bit rate would increase). The method then encodes (at 655) frames 1 to NB as B frames using some motion vectors calculated in the first round (steps 615 to 645).
Method 600 then determines (at 660) whether frame n is the last frame in the video sequence.
If so, the method is terminated. If frame n is not the last frame in the video sequence, the method reindexes frame n as frame 0 in the video sequence, so that frame n + 1 is re-indexed as frame 1, frame n + 2 is re-indexed as frame 2, and so on. The method then proceeds to step 615.
F ig. 7 shows a diagram 700 of the operating cost per frame in a series of frames of a video sequence. Fig. 7 is described in relation to Fig. 6. Motion cost per frame is displayed on a first axis 705 and the frame number, for individual frames in the video sequence, is shown on a second axis 710. The operating cost of individual frames is indicated by points 715. A frame immediately after a scene clip is marked by a dot inside a circle 720 and a frame immediately after a scene clip is marked by a dot inside a square 725. Frames immediately before a scene clip are coded as P frames (at step 650).
Note that the operating cost, for a frame immediately after a scene clip, is significantly higher than the operating cost, for a frame immediately, before the scene clip. As such, the ratio between the motion cost of the frame immediately after a scene clip and the motion cosmade of the frame immediately before the scene clip (ie FMCm / FMCmJ) will have a relatively high value. A relatively high value for FMCmlFMCml typically indicates a scene clip between the two consecutive frames (frames m and m-1) and is likely to exceed the predetermined threshold value C, against which the value of FMCm / FMCW; iron (at step 630).
The scene clip detection method discussed in connection with Fig. 6 provides a simple and effective method for detecting scene clips in a video sequence. In an alternative embodiment, the scene clip detection method is not used in conjunction with the partial two-step reuse method of Fig. 5, but is used independently to identify scene clips in a video sequence.
F ig. 8 is a fate diagram of a method 800 for identifying scene clips in a video sequence. According to the practice herein, a video sequence with a relatively high number of scene clips has a relatively high number of high number frames with discontinuous content and where a scene clip is at a point in time when the content of the video sequence is discontinuous. The method from Fig. 8 is similar to the method from Fig. 6 and only the steps that differ are discussed here in detail.
Method 800 may be performed, for example, by coding component 110. Method 800 is initiated by the reception (at 805) of a video sequence. The video sequence comprises a number of video frames indexed from 0 to N, where each frame is divided into sets of pixels called pixel blocks. The method then sets (at 815) a counter m equal to 1. The method calculates: (at 820) motion vectors for the pixel blocks in frame m using information from a previous frame or information from a previous v and a future frame, ie. frame m can be treated as a P-frame or B-frame. The method then determines (at 825) a motion compensated error (MCE) for the pixel blocks in frame m. In an alternative embodiment, MCE is not calculated for the pixel blocks in frame m.
The method then determines (at 830) the operating cost (FMCm) of a frame m for frame m and the operating cost of the frame (FMCM) of a frame immediately preceding frame m.
The method then calculates (at 830) the value of FMCm / FMCmn. However, during the first iteration of method 800, where frame m-1 is an I frame, the value of FMCm / FMCM is set to 0.
Since the operating cost of the frame for an I frame is 0, this prevents a division operation where the denominator is 0.
The method then determines (at 835) whether the value of FMCm / FMCM is less than a predetermined threshold value C or Cm. If the method determines (at 835) that the value of FMCm / FMCM is less than the predetermined threshold value C, the method (at 840) increases the counter m by one and continues at step 820. If the method determines (at 835) that the value of FMCm / FMCM does not is less than the predetermined threshold value C (ie that the value of FMCm / FMCM is equal to or greater than the dry-determined threshold value C), the method (at 845) sets a variable n to equal to ml. The method then marks (at 850) frame n as a frame immediately after a detected scene clip and frame m as a frame immediately after a detected scene clip. As such, the method 800, based on the value of the FMCm / FMCm-1 ratio, determines if it is a scene clip between frame m and frame m-1.
The 800 method then determines (at 860) whether frame n is the last frame in the video sequence.
If so, the method is terminated. If frame n is not the last frame in the video sequence, the method reindexes frame n as frame 0 in the video sequence, so that frame n + 1 is reindexed as frame 1, frame n + 2 as frame 2, etc. The method then proceeds to step 815. 20 25 30 535 206 21 SECTION II: COMPLETE RECYCLING OF MOTION VECTORS DETERMINED IN THE FIRST ROUND In most conventional video coders, the process of calculating motion vectors consumes a significant proportion of computer resources. Therefore, when it comes to encoding a video sequence, it is advantageous to minimize the number of motion vector calculations as much as possible.
In the two-round coding method of the present invention, it is therefore advantageous in the second round to reuse as many of the motion vectors calculated in the first round as possible, so that fewer new motion vectors need to be calculated in the second round.
In an alternative embodiment, all of the motion vectors calculated in the first round are reused in the second round, to provide a complete reuse of motion vectors. The complete reuse method is similar to the partial reuse method, described above in Section I, except that in the first round motion vectors are calculated for each frame, using information from the same reference frame (frame 0) instead of a previous frame (as in the partial frame). the reuse method). During the second round, each motion vector determined in the first round is reused in the second round to encode B-frames and P-frames in a set of consecutive frames.
Fig. 9 is a graphical illustration of a set of successive frames 900, which are processed in two rounds with ﬁ independent reuse of motion vectors, calculated in the first round. The set of successive frames 900 consists of five frames 910 which are displayed in display order. As shown in Fig. 9, for the first and second rounds, each frame 910 is identified by a frame type (I, P or B) and an indexing number (1 to 4) indicating the display order of the frame in the set of successive frames. 900.
Arrows 920 indicate motion vector estimates using information from one frame (at the beginning of arrow 920), to calculate motion vectors for another frame (at the end of arrow 920). The arrows 920 shown above frames 910 show calculations of motion vectors occurring in the first round and the arrows 920 shown below frames 910 relate to calculations of motion vectors occurring in the second round. As shown in Fig. 9, in the first round, motion vectors for frames 1 to 4 are calculated, using the information from the same reference frame (frame 0). As such, motion vectors for frames 1 to 4 are calculated as if the final coding were to be performed using all P frames, with each P frame referring to frame 0. In the example shown in Fig. 9, it has been determined that three B frames should be coded in the set of consecutive frames 900. 'In the second round, the determined number of B-frames shall be coded (as B1, Bg and B;) with the next frame coded as a P-frame (frame P4). As shown in Fig. 9, new calculations of motion vectors must be performed, in order to calculate new motion vectors in the second round for coding the B-frames and the P-frames. For example, arrow 625 indicates a new calculation of motion vectors performed in the second round.
Note, however, that all motion vectors, calculated in the first round, can be reused in the second round, as the condition for reusing a motion vector (the information derives from the same frame in both rounds) is met for each motion vector calculated in the first round. For example, the information from frame 10 is used to calculate motion vectors from frame P3 in the first round. In the second round, the information ~ from 10 is also needed to calculate motion vectors for frame Bg. As such, motion vectors, calculated for frame P3 in the first round, can be reused to encode frame B; in the second round.
Through a similar analysis, the frames B1, B; and P4 in the second round reuse the motion vectors calculated in the first round for respective frames P1, P2 and P4. The calculations of motion vectors which provide motion vectors for a frame in the first round and which are reused in the second round are indicated by dotted arrows 935. When a B-pixel block typically uses two motion vectors, one calculated from a previous frame and one from an upcoming frame, also need the coding of B1, B; and B 3_ in the second round, new motion vectors calculated ﬁ from the information of frame P4.
Fig. 10 is a flow chart of a method 1000 for encoding a video sequence in two rounds, where all motion vectors calculated in the first round are reused in the second round. The method in Fig. 10 is similar to the method in Fig. 5 and only the steps that differ are discussed here in detail. The method 600 may be performed, for example, by the coding component 1 10.
Method 1000 begins with the reception (at 1005) of a video sequence. The video sequence comprises a number of video frames indexed from 0 to N. Each frame is divided into v sets of pixels called pixel blocks. The method then encodes (at 1010) frame 0 ~ from the plurality of frames as an I frame. The method then sets (at 1015) a counter to equal to 1. The method calculates (at 1020) motion vectors for the pixel blocks in frame m using information from the previous frame 0 (ie motion vectors for the pixel blocks in frame m are calculated as if frame m is to be coded as a P-frame referring to frame 0). A graphic illustration of step 1020 is shown in Fig. 9. The method then determines (at 1025) a motion compensated error (MCE) for the pixel blocks in frame m. In an alternative embodiment, the MCE for the pixel blocks is not calculated.
The method then determines (at 1030) an average value of the frame motion cost (AFMCm) for frames 1 to m. In an alternative embodiment, the method (at 1030) determines a frame motion cost (FMCm) for the individual frame m. The method then determines (at 1035 ) if the AFMCm value for frame 1 to m is less than a predetermined threshold value T. In an alternative embodiment, the method (at 103 5) determines if the AFMCm value is less than a predetermined threshold value Tm, where Tm represents a set of predetermined threshold values that vary in value depending on the value of the counter m.
If the method determines (at 1035) that the AFMCm value is less than the predetermined threshold value T, the method (at 1040) increases the counter by one and continues at step 1020. If the method determines that (at 1035) the AFMCm value is not less than the predetermined threshold value T, the method (at 1045) sets a variable n equal to the maximum of 1 or (m- 1), so that the minimum value of n is 1. The method also sets the number of (NB) B frames to be coded to equal to n-1.
The first round of Method 1000 operations consists of steps 1015 to 1045 and determines the number of (NB) B frames to be encoded in the set of consecutive frames from 1 to n. The second round of Method 1000 operation consists of steps 1050 to 1055. At step 1050, the method encodes frame n as a P frame by reusing certain motion vectors calculated in the first round. In some embodiments, motion vectors calculated for the pixel blocks in frame n are reused to calculate motion vectors for the pixel blocks in frame n in the second round, as illustrated in Figs. 9. The method then encodes (at 1055) frames 1 to NB as B frames by reusing the remaining motion vectors calculated in the first round. In some embodiments, motion vectors for the pixel blocks in frames 1 to NB (which are P frames) calculated in the first round are used to calculate motion vectors for the pixel blocks in frames 1 to NB (which are B frames) in the second round, as illustrated graphically in Fig. 9.
The 1000 method then determines (at 1060) whether frame n is the last frame in the video sequence. If so, the method is terminated. If frame n is not the last frame in the video sequence, the method reindexes frame n as frame 0 in the video sequence so that frame n + 1 is re-indexed as frame 1, frame n + 2 is re-indexed as frame 2, etc.
The method then proceeds to step 1015.
Fig. 11 is a flow chart of Fig. 10's complete two-step reuse method combined with a scene cut detection method in accordance with the present invention. The method of Fig. 11 is similar to the methods of Figs. 6 and 10 and only the steps that differ are discussed in detail here. The method 1100 can be performed, for example, by the coding unit 10.
Method 1100 is initiated by receiving (at 1105) a video sequence consisting of a plurality of video frames indexed from 0 to N, each frame being divided into sets of pixels called pixel blocks. The method then encodes (at 11 10) frame 0 ﬁ 'from the plurality of video frames as an I frame. The method then sets (at 1115) a counter m to equal to 1. The method calculates (at 1120) motion vectors for the pixel blocks in frame m using information from frame 0. The method then determines (at 1 125) a motion compensated error (MCE) for the pixel blocks in frame m. 1 an alternative embodiment does not calculate MCE for the pixel blocks in frame m.
The method then determines (at 1130) the operating cost of a frame (FMCm) for frame m and the operating cost of a frame (FMCm) of a frame immediately preceding frame m.
The method then calculates (at 1130) the value of FMCm / FMCWL However, during the first iteration of method 1100 where frame m-1 is an I frame, the value of FMCm / FMCm is set. 10 15 20 25 30 535 206 25 1 to equal to 0. Since the operating cost of the frame for an I frame is 0, this prevents a division operation where the proximity is 0.
The method then determines (at 1135) whether the value of FMCm / FMCM is less than a dry-determined threshold value C or Cm. If the method determines (at 1135) that the value of FMCm / FMCM; is less than the predetermined threshold value C, the method (at 1 140) increases the counter m by one and continues at step 1120. If the method determines (at 1135) that the value of FMCm / FMCm-, is not less than the predetermined threshold value C, sets the method (at 1145) a variable n to equal to the maximum of 1 or (ml) and it also sets the number of (NB) B-frames to be coded to equal to nl.
The first round of method 1100 comprises steps 11 to 1145 and determines the number of (NB) B frames to be encoded in the set of successive frames from 1 to n. The second round of method 1100 comprises steps 1150 to 1155. At step 1150, the method encodes frame n as a P frame, using some motion vectors calculated in the first round. The method then encodes (at 1155) frame 1 to NB as B frames, by reusing the remaining motion vectors calculated in the first round.
The 1100 method then determines (at 1160) whether frame n is the last frame in the video sequence. If so, the method is terminated. If frame n is not the last frame in the video sequence, the method then re-indexes frame n as frame 0 in the video sequence so that frame n + 1 is re-indexed as frame 1, frame n + 2 is re-indexed as frame 2, and so on.
The method then proceeds to step 1115.
Fig. 12 shows a computer system in which some of the embodiments of the invention are implemented. Computer systems 1200 include a bus 1205, a processor 1210, a system memory 1215, a ROM memory 1220, a pen storage unit 1225, an input unit 1230, and an output unit 1235.
Bus 1205 collectively represents all systems, peripherals, and chipset buses that communicatively connect the numerous internal units of the computer system 1200. For example, bus 1205 communicatively connects processor 1210 to ROM 1220, system memory 1215, and permanent storage 1225. 10 15 '20 25 30 535 206 26 The ROM 1220 stores static data and instructions needed by the processor 1210 and other modules in the computer system. The permanent storage unit 1225, on the other hand, is only an RW memory unit. The device is a useless memory that stores instruction and data even when the computer system 1200 is turned off. Some embodiments of the invention use a mass storage device (such as a magnetic or optical disk and its corresponding disk drive) as the permanent storage device 1225. Other embodiments use a removable storage device (such as an opid disk or a zip® disk and their corresponding disk drive) as the permanent storage device. . The permanent storage unit may, for example, contain instructions with applications implementing methods of the present invention.
Like the permanent storage unit 1225, the system memory 1215 is an RW memory unit. However, unlike the storage unit 1225, the system memory is a volatile RW memory, such as a random access memory (RAM). The system memory stores some of the instructions and data that the processor needs to run. In some embodiments, the process processes are stored in the system memory 1215, the permanent storage unit 1225 and / or the RW memory 1220. From these various memory units, the processor 1210 receives instructions to be executed and data to process, in order to execute the process processes.
The bus 1205 also connects to the input and output units 1230 and 1235. The input unit enables the user to communicate information and select commands to the computer system. The input unit 1230 includes an alphanumeric keyboard and pointing device.
The output devices 1235 display images generated by the computer system. For example, these devices display design layouts for IC circuits. The output devices include printers and display devices, such as cathode ray tube CRT monitors or LCD monitors.
Finally, as shown in Fig. 12, bus 1205 also connects the computer 1200 to a network 1265 through a network adapter (not shown). In this way, the computer can become part of a network of computers (such as a local area network ("LAN"), a wide area network ("WAN") or an intranet) or a network of networks (such as the Internet). Some or all of the components of the computer system 1200 may be used in connection with the invention. However, it will be appreciated that one of ordinary skill in the art will appreciate that other system configuration may be used in conjunction with the present invention. 10 15 535 206 27 Some embodiments described above relate to video frames in Y U V format. However, one of ordinary skill in the art will recognize that these embodiments may also relate to a variety of formats other than YUV. In addition, other video image formats (such as RGB) can be easily transformed into the YUV format. Furthermore, embodiments of the given invention may relate to various video coding applications (e.g., DVD, digital storage media, television broadcasting, Internet streaming, communications, teleconferencing, etc.) in real time or in real time. Embodiments of the present invention can also be used for video sequences with different coding standards such as H.263 and H.264 (also known as MPEG-4 / Part 10).
While the invention has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that other specific forms may be incorporated into the invention without departing from the spirit of the invention. Thus, a person of ordinary skill in the art will recognize that the invention is not limited by the above-mentioned illustrative details, but rather is defined by the appended claims.

权利要求:
Claims (41)
[1]
A method comprising: calculating a number of motion vectors for a first uncoded video image; calculating a number of motion vectors for a second ocodacl video image; calculating a special value based on a cost value for the number of motion vectors for the first video image and a cost value for the number of motion vectors for the second video image, the cost value for a number of motion vectors for a video image being based on a number of bits required for encoding motion vectors the video image; determining whether there is a scene clip between the first video image and the second video image by determining whether the particular value satisfies a threshold value; and encoding the first and second video images based on whether there is a scene clip.
[2]
Method according to claim 1, in which there is no scene cut when the special value does not satisfy the threshold value.
[3]
Method according to claim 1, in which the scene clip is present when the special value satisfies the threshold value.
[4]
The method of claim 3, further comprising marking the second video image as a video image after the scene clip.
[5]
The method of claim 1, wherein the particular value is a ratio of cost values to the motion vectors for the first and second video images and wherein said determining whether the particular value satisfies the threshold value includes determining whether the ratio is less than the threshold value.
[6]
The method of claim 4, wherein said marking of the second video image comprises marking the second video image as an I frame.
[7]
The method of claim 3, further comprising marking the first video image as a video image before the scene clip. 10 15 20 25 30 535 206 29
[8]
The method of claim 1, wherein the cost value is further based on (i) a motion compensation error associated with the motion vector of the video image and (ii) a Lagrang multiplier.
[9]
The method of claim 1, wherein the threshold value varies monotonically based on a number of video images that have been processed in a sequence of video images including the first and second video images, and wherein the threshold value decreases as the number of video images that have been processed increases.
[10]
The method of claim 1, wherein the first video image comprises a plurality of sets of pixels, said calculating the plurality of motion vectors for the first video image comprising calculating at most one motion vector for each set of pixels for the first video image.
[11]
The method of claim 1, wherein the second video image comprises a plurality of sets of pixels, said calculating the number of motion vectors for the second video image comprising calculating at most one motion vector for each set of pixels for the second video image.
[12]
The method of claim 1, wherein the plurality of motion vectors for the first and second video images are calculated as if the first video image and the second video image are to be encoded as unidirectional motion compensated video image
[13]
A method comprising: calculating a value based on a cost value of a number of motion vectors for a first video image and a cost value of a number of motion vectors for a second video image, the cost value of a number of motion vectors for a video image being based on a number of bits required. to encode a number of motion vectors for the video image; and determining whether the value satisfies a threshold value for (i) determining whether there is a scene clip between the first video image and the second video image, and (ii) determining a number of bidirectional motion compensated video images in a video sequence to be encoded, the video sequence comprising the first and second video image .. 10 15 20 25 30 535 206 30
[14]
The method of claim 13, wherein the number of bidirectional motion compensated video images is based on at least one cost value of motion vectors which are calculated under the assumption k that all video images after an initial video image in the video sequence will be encoded as unidirectional motion compensated video images in the video sequence.
[15]
The method of claim 13, further comprising encoding the video sequence using the determined number of bidirectional motion compensated video images.
[16]
The method of claim 15, wherein said calculating the value comprises calculating at least one motion vector for the first video image and at least one motion vector for the second video image, the motion vectors being calculated in a first set of operations during a first review of the video sequence, said coding being performed. in a second set of operations during a second review of the video sequence, and wherein said encoding comprises using at least one motion vector calculated during the first set of operations.
[17]
The method of claim 15, wherein said calculating the value comprises calculating at least one motion vector for the first video image and at least one motion vector for the second video image, the motion vectors being calculated in a first set of operations during a first review of the video sequence, said coding being performed in a second set of operations during a second review of the video sequence, and wherein said encoding comprises using all motion vectors calculated during the first set of operations.
[18]
A computer readable medium (1200) storing a computer program executable by a processor (1210), the computer program comprising sets of instructions for: calculating a number of motion vectors for a first uncoded video image; calculating a number of motion vectors for a second uncoded video image; calculating a special value based on a cost value for the plurality of motion vectors for the first video image and a cost value for the number of motion vectors for the second video image, wherein a cost value for a number of motion vectors for a video image is based on a number of bits required to encode the motion vector for the video image determining whether there is a scene clip between the first video image and the second video image by determining whether the particular value satisfies a threshold value; 10 15 20 25 30 535 206 31 and encoding the first and second video images based on whether there is a scene clip.
[19]
The computer readable medium (1200) of claim 18, wherein there is no scene clip when the particular value does not satisfy the threshold value.
[20]
The computer readable medium (1200) of claim 18, wherein the scene clip is present when the particular value satisfies the threshold value.
[21]
The computer readable medium (1200) of claim 20, wherein the computer program further comprises a set of instructions for marking the second video image as a video image after the scene clip.
[22]
The computer readable medium (1200) of claim 18, wherein the particular value is a ratio of cost values to the motion vectors of the first and second video images, the set of instructions for determining whether the particular value satisfies the threshold value includes a set of instructions for determining whether the ratio is less than the threshold value.
[23]
The computer readable medium (1200) of claim 21, wherein the set of instructions for marking the second video image comprises a set of instructions for marking the second video image as an I frame.
[24]
The computer readable medium (1200) of claim 20, wherein the computer program further comprises a set of instructions for marking the first video image as a video image before the scene clip.
[25]
The computer readable medium (1200) of claim 18, wherein the cost value is further based on (i) a motion compensation error associated with the motion vector of the video image and (ii) a Lagrang multiplier.
[26]
The computer readable medium (1200) of claim 18, wherein the threshold value varies monotonically based on a number of video images processed in a sequence of video images including the first and second video images, the threshold value decreasing as the number of video images treated increases.
[27]
The computer readable medium (1200) of claim 18, wherein the first video image comprises a plurality of sets of pixels, the set of instructions for calculating the number of motion vectors for the first video image comprising a set of instructions for calculating at most one motion vector for each set of pixels for the first video image.
[28]
The computer readable medium (1200) of claim 18, wherein the second video image comprises a plurality of sets of pixels, the set of instructions for calculating the plurality of motion vectors for the second video image comprising a set of instructions for calculating at most one motion vector for each set of pixels for the second video image. second video image.
[29]
The computer readable medium (1200) of claim 18, wherein the number of motion vectors for the first and second video images is calculated as if the first video image and the second video image are to be encoded as unidirectional motion compensated video images.
[30]
A computer readable medium storing a computer program executable on a processor, the computer program comprising sets of instructions for: calculating a value based on a cost value of a number of motion vectors for a first video image and a cost value of a number of motion vectors for a second video image, wherein the cost value for a number of motion vectors for a video image is based on a number of bits required to encode a number of motion vectors for the video image; and determining whether the value satisfies a threshold value for (i) determining whether there is a scene clip between the first video image and the second video image, and (ii) determining a number of bidirectional motion compensated video images in a video sequence to be encoded, the video sequence comprising the first and second video image.
[31]
The computer readable medium (1200) of claim 30, wherein the number of bidirectional motion compensated video images is based on at least one cost value of motion vectors calculated assuming that all video images or an initial video image in the video sequence will be encoded as unidirectional motion compensated video images in the video sequence. i 10 15 20 25 30 535 206 33
[32]
The computer readable medium (1200) of claim 30, wherein the computer program further comprises a set of instructions for encoding the video sequence using the determined number of bidirectional motion compensated video images.
[33]
The computer readable medium (1200) of claim 32, wherein the set of instructions for calculating the value comprises a set of instructions for calculating at least one motion vector for the first video image and at least one motion vector for the second video image, the motion vectors being calculated in a first set of operations during a first review of the video sequence, said coding being performed in a second set of operations during a second review of the video sequence, and wherein the set of coding instructions comprises a set of instructions for using at least one motion vector calculated during the first set of operations.
[34]
The computer readable medium (1200) of claim 32, wherein the set of instructions for calculating the values comprises a set of instructions for calculating at least one motion vector for the first video image and at least one motion vector for the second video image, the motion vectors being calculated in a first set of operations during a first review of the video sequence, said coding being performed in a second set of operations during a second review of the video sequence, and wherein the set of coding instructions comprises a set of instructions for using all motion vectors calculated during the first set of operations.
[35]
The method of claim 1, wherein the first and second video images comprise a type of content, wherein the threshold value is based on the type of content in the first and second video images, wherein different threshold values are used for different types of content.
[36]
The method of claim 35 in which the type of content comprises one of sports content, soap opera content, news report content and video conferencing content.
[37]
The method of claim 12, wherein a unidirectional motion compensated video image is a P frame. 10 535 206 39
[38]
The method of claim 13, wherein a bidirectional motion compensated video image is a B frame.
[39]
The method of claim 1, wherein the cost value of the motion vectors of a video image is also based on motion compensated errors for encoding the video image.
[40]
The method of claim 13, wherein the value is also based on motion compensated errors for encoding the first video image and motion compensated errors for encoding the second video image.
[41]
The computer readable medium of claim 30, wherein the value is also based on motion compensated errors for encoding the first video image and motion compensated errors for encoding the second video image.

类似技术:

公开号 | 公开日 | 专利标题

SE535206C2|2012-05-22|Detecting scene clips in a video sequence

US9706202B2|2017-07-11|Image encoding apparatus, image encoding method, image decoding apparatus, and image decoding method

RU2377737C2|2009-12-27|Method and apparatus for encoder assisted frame rate up conversion | for video compression

EP3389276B1|2021-01-13|Hash-based encoder decisions for video coding

EP1856918B1|2009-08-12|Method and apparatus of temporal error concealment for p-frame

EP1241893B1|2009-08-05|Reference frame prediction and block mode prediction for fast motion searching in advanced video coding

JP4001400B2|2007-10-31|Motion vector detection method and motion vector detection device

JP5081305B2|2012-11-28|Method and apparatus for interframe predictive coding

US9936217B2|2018-04-03|Method and encoder for video encoding of a sequence of frames

US20090052534A1|2009-02-26|Direction detection algorithms for h.264/avc intra prediction

JP2001313956A|2001-11-09|Hierarchical mixed type shot conversion detecting method in mpeg compression video environment

CN101406056A|2009-04-08|Method of reducing computations in intra-prediction and mode decision processes in a digital video encoder

CN101621696A|2010-01-06|Enabling selective use of fractional and bidirectional video motion estimation

CN108495135B|2020-11-10|Quick coding method for screen content video coding

EP2034742A2|2009-03-11|Video coding method and device

TWI411305B|2013-10-01|Dynamic reference frame decision method and system

US8447104B2|2013-05-21|Method, medium and system adjusting predicted values based on similarities between color values for image compressing/recovering

CN111953987A|2020-11-17|Video transcoding method, computer device and storage medium

Dumitras et al.2004|I/P/B frame type decision by collinearity of displacements

CN109565600B|2021-11-09|Method and apparatus for data hiding in prediction parameters

CN104717511A|2015-06-17|Video coding or decoding method and device

CN103856780A|2014-06-11|Video encoding method, decoding method, encoding device and decoding device

US20150341659A1|2015-11-26|Use of pipelined hierarchical motion estimator in video coding

KR101145399B1|2012-05-15|Apparatus and Method for High-speed Multi-pass Encoding

CN111901605A|2020-11-06|Video processing method and device, electronic equipment and storage medium

同族专利:

公开号 | 公开日

US20050053135A1|2005-03-10|

EP1665801A1|2006-06-07|

WO2005027526A1|2005-03-24|

SE529152C2|2007-05-15|

EP1942678A1|2008-07-09|

US7856059B2|2010-12-21|

EP1942678B1|2014-08-20|

SE0500806L|2005-07-07|

SE0700732L|2007-03-26|

US20080043847A1|2008-02-21|

US7295612B2|2007-11-13|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

GB2263602B|1992-01-24|1995-06-21|Sony Broadcast & Communication|Motion compensated video signal processing|

US5592226A|1994-01-26|1997-01-07|Btg Usa Inc.|Method and apparatus for video data compression using temporally adaptive motion interpolation|

US5774593A|1995-07-24|1998-06-30|University Of Washington|Automatic scene decomposition and optimization of MPEG compressed video|

US6057893A|1995-12-28|2000-05-02|Sony Corporation|Picture encoding method, picture encoding apparatus, picture transmitting method and picture recording medium|

US6456328B1|1996-12-18|2002-09-24|Lucent Technologies Inc.|Object-oriented adaptive prefilter for low bit-rate video systems|

US6005626A|1997-01-09|1999-12-21|Sun Microsystems, Inc.|Digital video signal encoder and encoding method|

US6281942B1|1997-08-11|2001-08-28|Microsoft Corporation|Spatial and temporal filtering mechanism for digital motion video signals|

US6307886B1|1998-01-20|2001-10-23|International Business Machines Corp.|Dynamically determining group of picture size during encoding of video sequence|

FR2783388B1|1998-09-15|2000-10-13|Thomson Multimedia Sa|IMAGE COMPRESSION METHOD AND DEVICE FOR CARRYING OUT SAID METHOD|

CN1255021A|1998-11-23|2000-05-31|惠普公司|Device and method for modifying compressed image without calculating movement vectors again|

US6618507B1|1999-01-25|2003-09-09|Mitsubishi Electric Research Laboratories, Inc|Methods of feature extraction of video sequences|

US7003038B2|1999-09-27|2006-02-21|Mitsubishi Electric Research Labs., Inc.|Activity descriptor for video sequences|

JP3757088B2|1999-10-26|2006-03-22|日本電気株式会社|Moving picture coding apparatus and method|

JP2002101416A|2000-09-25|2002-04-05|Fujitsu Ltd|Image controller|

US6731821B1|2000-09-29|2004-05-04|Hewlett-Packard Development Company, L.P.|Method for enhancing compressibility and visual quality of scanned document images|

US7058130B2|2000-12-11|2006-06-06|Sony Corporation|Scene change detection|

US7203238B2|2000-12-11|2007-04-10|Sony Corporation|3:2 Pull-down detection|

US6934335B2|2000-12-11|2005-08-23|Sony Corporation|Video encoder with embedded scene change and 3:2 pull-down detections|

US6944224B2|2002-08-14|2005-09-13|Intervideo, Inc.|Systems and methods for selecting a macroblock mode in a video encoder|

US7149250B2|2002-10-16|2006-12-12|Koninklijke Philips Electronics N.V.|Video encoding method|

US7609763B2|2003-07-18|2009-10-27|Microsoft Corporation|Advanced bi-directional predictive coding of video frames|

US7430335B2|2003-08-13|2008-09-30|Apple Inc|Pre-processing method and system for data reduction of video sequences and bit rate reduction of compressed video sequences using spatial filtering|

US7295612B2|2003-09-09|2007-11-13|Apple Inc.|Determining the number of unidirectional and bidirectional motion compensated frames to be encoded for a video sequence and detecting scene cuts in the video sequence|US8111754B1|2001-07-11|2012-02-07|Dolby Laboratories Licensing Corporation|Interpolation of video compression frames|

US7266150B2|2001-07-11|2007-09-04|Dolby Laboratories, Inc.|Interpolation of video compression frames|

WO2003063498A1|2002-01-22|2003-07-31|Koninklijke Philips Electronics N.V.|Reducing bit rate of already compressed multimedia|

US8249113B2|2004-03-19|2012-08-21|Broadlogic Network Technologies, Inc.|Method and system for providing faster channel switching in a digital broadcast system|

US20050232497A1|2004-04-15|2005-10-20|Microsoft Corporation|High-fidelity transcoding|

US7728909B2|2005-06-13|2010-06-01|Seiko Epson Corporation|Method and system for estimating motion and compensating for perceived motion blur in digital video|

US20070274385A1|2006-05-26|2007-11-29|Zhongli He|Method of increasing coding efficiency and reducing power consumption by on-line scene change detection while encoding inter-frame|

US20080025408A1|2006-07-31|2008-01-31|Sam Liu|Video encoding|

US7996666B2|2007-09-04|2011-08-09|Apple Inc.|User influenced loading sequence of startup applications|

EP2191651A1|2007-09-28|2010-06-02|Dolby Laboratories Licensing Corporation|Video compression and tranmission techniques|

US8457958B2|2007-11-09|2013-06-04|Microsoft Corporation|Audio transcoder using encoder-generated side information to transcode to target bit-rate|

US8908765B2|2007-11-15|2014-12-09|General Instrument Corporation|Method and apparatus for performing motion estimation|

US8218633B2|2008-06-18|2012-07-10|Kiu Sha Management Limited Liability Company|Bidirectionally decodable Wyner-Ziv video coding|

US20100008419A1|2008-07-10|2010-01-14|Apple Inc.|Hierarchical Bi-Directional P Frames|

CN102113326A|2008-08-04|2011-06-29|杜比实验室特许公司|Overlapped block disparity estimation and compensation architecture|

KR101279573B1|2008-10-31|2013-06-27|에스케이텔레콤 주식회사|Motion Vector Encoding/Decoding Method and Apparatus and Video Encoding/Decoding Method and Apparatus|

US8396114B2|2009-01-29|2013-03-12|Microsoft Corporation|Multiple bit rate video encoding using variable bit rate and dynamic resolution for adaptive video streaming|

US8311115B2|2009-01-29|2012-11-13|Microsoft Corporation|Video encoding using previously calculated motion information|

US8737475B2|2009-02-02|2014-05-27|Freescale Semiconductor, Inc.|Video scene change detection and encoding complexity reduction in a video encoder system having multiple processing devices|

EP2224745B1|2009-02-27|2019-11-06|STMicroelectronics Srl|Temporal scalability in case of scene changes|

US8270473B2|2009-06-12|2012-09-18|Microsoft Corporation|Motion based dynamic resolution multiple bit rate video encoding|

US8705616B2|2010-06-11|2014-04-22|Microsoft Corporation|Parallel multiple bitrate video encoding to reduce latency and dependences between groups of pictures|

US9032467B2|2011-08-02|2015-05-12|Google Inc.|Method and mechanism for efficiently delivering visual data across a network|

US9591318B2|2011-09-16|2017-03-07|Microsoft Technology Licensing, Llc|Multi-layer encoding and decoding|

US9094684B2|2011-12-19|2015-07-28|Google Technology Holdings LLC|Method for dual pass rate control video encoding|

US11089343B2|2012-01-11|2021-08-10|Microsoft Technology Licensing, Llc|Capability advertisement, configuration and control for video coding and decoding|

US8948529B1|2012-10-30|2015-02-03|Google Inc.|Multi-pass encoding|

US10116943B2|2013-10-16|2018-10-30|Nvidia Corporation|Adaptive video compression for latency control|

JP6335504B2|2013-12-20|2018-05-30|キヤノン株式会社|Image processing apparatus, image processing method, and program|

US20150208072A1|2014-01-22|2015-07-23|Nvidia Corporation|Adaptive video compression based on motion|

CN103974068B|2014-05-07|2017-07-07|电子科技大学|A kind of method that video size based on content reduces|

US10951875B2|2018-07-03|2021-03-16|Raxium, Inc.|Display processing circuitry|

法律状态:

优先权:

申请号 | 申请日 | 专利标题

US10/658,938|US7295612B2|2003-09-09|2003-09-09|Determining the number of unidirectional and bidirectional motion compensated frames to be encoded for a video sequence and detecting scene cuts in the video sequence|

PCT/US2004/015032|WO2005027526A1|2003-09-09|2004-05-13|Video encoding method and scene cut detection method|

[返回顶部]